Novel Approach for Identification of Basic and Effective Reproduction Numbers Illustrated with COVID-19

This paper presents a novel numerical technique for the identification of effective and basic reproduction numbers, Re and R0, for long-term epidemics, using an inverse problem approach. The method is based on the direct integration of the SIR (Susceptible–Infectious–Removed) system of ordinary differential equations and the least-squares method. Simulations were conducted using official COVID-19 data for the United States and Canada, and for the states of Georgia, Texas, and Louisiana, for a period of two years and ten months. The results demonstrate the applicability of the method in simulating the dynamics of the epidemic and reveal an interesting relationship between the number of currently infectious individuals and the effective reproduction number, which is a useful tool for predicting the epidemic dynamics. For all conducted experiments, the results show that the local maximum (and minimum) values of the time-dependent effective reproduction number occur approximately three weeks before the local maximum (and minimum) values of the number of currently infectious individuals. This work provides a novel and efficient approach for the identification of time-dependent epidemics parameters.


Introduction
Infectious diseases have become a particularly important subject of study in recent years due to the COVID-19 epidemic. Daily reports from across the world have enabled scientists to collect data for research into new approaches for studying infectious diseases. Early identification of outbreaks is of major concern to public health authorities [1][2][3][4].
Key parameters of epidemic spread are the basic reproduction number, R 0 and the effective reproduction number, R e . These measure the severity of the infection.

•
The basic reproduction number R 0 is defined in epidemiology as the average number of secondary infections caused by a single infected person in a susceptible population, as found in [5][6][7]. It is used to measure the transmission potential of a disease; • The effective reproduction number R e is defined as the product of the basic reproduction number and the fraction of the host population that is susceptible. It estimates the rate of spread of an epidemic.
In [8] and other works [6,7,[9][10][11][12], the focus was on the basic reproduction number, whereas the authors of [13] estimated the effective reproduction number. They applied their method to the spread of COVID-19 in municipalities within Rio de Janeiro, Brazil. Alvarez et al. [14] proposed a variational technique for inverting the renewal equation, based on two formulations of the renewal equation presented in [15,16], to estimate the daily reproduction number.
The present work proposes a novel approach for the identification of the time-dependent reproduction numbers. It extends the SIR-type model by assuming time-dependent infection and recovery rates [40,41], which are required for the estimation of the reproduction numbers. The method is based on direct integration of the SIR system and an inverse problem approach for solving the resulting equation for the ratio of the infection and recovery rates. The evaluation of the time-dependent reproduction numbers over long time periods gives a tool for investigating the wave formation of the infection curve for the COVID-19 epidemic. This study examines and shows how the time-dependent effective reproduction number compares to the number of infectious individuals over time and gives a novel approach for predicting epidemic waves.

The SIR Model for the Spread of an Infectious Disease
The notations in the SIR model include S(t)-the number of susceptible, I(t)-the number of infectious, R(t)-the number of removed individuals, and t-time. The total population N = S(t) + I(t) + R(t) is considered constant in the above equations.
The infection rate β (0 < β < 1) is the probability that a random infectious person infects a random susceptible person. A major approximation here is the assumption that the population under study is well-mixed, so that every person has an equal probability of coming into contact with every other person. The recovery rate γ (0 < γ < 1) gives the probability that an infectious person recovers.
The SIR model consists of the following system of differential equations: Figure 1 shows the diagram for the SIR model, corresponding to the system (1)-(3). The SIR model assumes that the removed individuals are no longer susceptible or infectious. The number of cases of people recovered from COVID-19 who are re-infected at the present moment is very limited, and the rate cannot be estimated; thus, the possibility of reinfection is not taken into account. Specific details on the assumptions and the adaptive SIR (A-SIR) model can be found in [40,41].

Integrating the SIR System
Equations (1) and (2) do not depend on the function R(t). Therefore, the SIR system can be viewed as consisting of two parts: (i) Equations (1) and (2), and (ii) Equation (3). If the function I(t) is known, the last Equation (3) can be solved by direct integration.
Let the coefficients β and γ be constant, and S(t) > 0, (1) and (2) follows the ordinary differential equation below, connecting the functions I and S (see also [42]): namely, Integrating with respect to S yields the following general solution to Equation (5): representing I as a function of S; • Another way to integrate the system is by excluding dt from Equations (1) and (3). This gives the following ordinary differential equation connecting the functions S(t) and R(t): The general solution to Equation (7) is then as follows: representing R as a function of S.

The Effective and Basic Reproduction Numbers R e and R 0
The conditions under which an epidemic occurs present an important question. An epidemic occurs if the number of infected individuals I(t) is increasing [12,42]. This means that the rate of change of the infectious, dI dt , is positive. From Equation (2), the following is implied: which results in the following condition: The parameter R e (t) is called the effective reproduction number (also referred to as ratio or rate) for a given disease. From a statistical point of view, the effective reproduction number is the expected number of secondary cases produced, in a completely susceptible population, by a typical infectious individual.
Another important characteristic of an infectious disease, the basic reproduction number (ratio, rate) R 0 , can be computed as the ratio of the known rates β and γ over time: where N is the size of the total population.
The basic reproduction number, R 0 , is used for answering two important questions, the first being what fraction of the population would become ill should an epidemic occur, and the second being what is the minimum fraction of the population that must be vaccinated in order to prevent an epidemic.
Diseases with smaller values of R 0 = βN γ are easier to eradicate than those with larger values R 0 , since a population can acquire herd immunity with a smaller fraction of the population vaccinated. For example, smallpox with R 0 ≈ 4 has been eradicated worldwide, while measles with R 0 ≈ 14 still has occasional outbreaks.

The Direct and the Inverse Problem
The initial value problem consisting of the system (1)-(3) with known constant coefficients β and γ, along with proper initial conditions derived from the given data, constitutes the direct problem. In reality, for a new disease, the values of the parameters β and γ are unknown. Finding these parameters involves solving an inverse problem for determining the coefficients and functions from the available data. The inverse problem for the estimation of the constants β and γ in the classical SIR model is solved in [43].

Time-Dependent Infection and Recovery Rates and Reproduction Numbers
The original SIR model assumes that the infection and recovery rates are constants. Equations (1)-(3), with proper initial conditions, allow the determination of I(t), S(t), and R(t), if the coefficients β and γ are known constants.
However, in the case of a pandemic, the rates may vary over time; hence, β = β(t) and γ = γ(t). Multiple factors can cause the rates to change over time. In the case of COVID-19, examples include social distancing, government restrictions, and preventive treatments.
Therefore, the basic reproduction number can be assumed to be a function of time and can be defined: with the effective reproduction number as follows: An algorithm for solving the inverse problem (1) and (2), under the condition that β = β(t) and γ = γ(t) are functions of time, is given in [40]. Knowing β = β(t) and γ = γ(t) allows us to estimate the reproduction numbers. The present work focuses on alternative methods for directly identifying the reproduction numbers from a given dataset, as well as comparisons of the different approaches.

The Inverse Problem for the Time-Dependent Reproduction Numbers
Let the values of S(t) and I(t) at some time moments, ν 1 , ν 2 , . . . , ν m , shown in Figure 2, be known and given by the following: If the population N is constant, the values of R(t) at time moments ν l can be obtained: Assuming N and S(t) are available, the time-dependent reproduction numbers can be determined after the estimation of the ratio γ(t)/β(t).
The time nodes and the subsets of fixed length of P days for identifying γ(t)/β(t).
The system of Equations (20) and (21), after excluding C k , gives a formula for the ratio Substituting the ratio (22) into Equations (11) and (12) gives the following expressions for the basic and effective reproduction numbers R 0,k and R e,k : respectively, on the time interval [ν k−1 , ν k ] for k = 2, 3, . . . , m.
If the number of susceptible individuals at two consecutive time moments ν k−1 and ν k is the same, i.e., σ k−1 = σ k , then the number of newly infected individuals for this period is zero, and the reproduction number is zero. In other words, the infection is starting to decline. (7) Equation (7) and the conditions (17) and (18), written for the time interval [ν k−1 , ν k ], result in the following system for k = 2, 3, . . . , m:

Estimating β/γ as a Solution of BVP for Equation
The system of Equations (24) and (25) gives a formula for the ratio Substituting the ratio (26) into Equations (11) and (12) gives the expressions for the basic and effective reproduction numbers R 0,k and R e,k on the time interval [ν k−1 , ν k ] for k = 2, 3, . . . , m: If the number of removed individuals, i.e., ρ k−1 and ρ k , for two consecutive time moments ν k−1 and ν k is the same (ρ k−1 = ρ k ), then the number of newly infected individuals for this period is zero, and the reproduction ratio is also zero. In other words, the infection is diminishing.
3.3. Identifying R 0 (t) and R e (t) from Equation (5) Using LSM Let the dataset for the values of S and I (σ k and λ k , respectively) be divided into subsets of fixed length of P days, as shown in Figure 2.
Following the LSM, we construct the following functions: for k = P + 1, P + 2, . . . , m. Here, µ l is the weight coefficient of the corresponding nodes. The necessary conditions for the minimization of the function Ψ with respect to γ k β k and C k are as shown: Let us introduce the following notations: The necessary conditions (29) become a system of two linear equations for the unknowns γ k β k and C k : The system (32) gives the values of the ratio γ k β k and C k : Then, the reproduction numbers become the following: Note that, in the case when P = 1, the formulas in (34) are equivalent to the corresponding formulas in (23).
3.4. Identifying R 0 (t) and R e (t) from Equation (7) Using LSM Similarly, following the LSM, we construct the following functions: for k = P + 1, P + 2, . . . , m. Here, µ l is the weight coefficient of the corresponding nodes. The necessary conditions for the minimization of the function Φ with respect to After introducing the following notations: the necessary conditions (36) become a system of two linear equations for the unknowns β k γ k and C k : The system (39) yields the values of the ratio Then, the reproduction numbers become the following: Note that the LSM solution is equivalent to the BVP solution if P = 1.

Results
We applied the method for reproduction number identification to real COVID-19 data, available in [44], for the countries of the United States and Canada, and the states of Georgia, Texas, and Louisiana, for a period ending on 4 January 2023.
We conducted simulations using the approaches described in Section 3. The solutions to the BVP are very sensitive to random perturbations in data values, as seen in Figure 3, while Figure 4 presents the identified values using the LSM method.
The numerically identified reproduction numbers using BVP and LSM follow the same trend, although the LSM method over 7 or more days helps reduce the effect of reported data containing errors, e.g., due to not being reported daily. This is why the basic and effective reproduction numbers are identified over 3 different time periods: 7 days; 14 days; and 21 days. The results using LSM for the solutions of (6) and (8) are identical. For this reason, herein we present the results obtained using Equation (6). The results obtained from this work show an interesting relationship between the number of currently infectious individuals (active COVID-19 cases reported in [44]) and the effective reproduction number, namely: The local maximum/minimum values of the effective reproduction number R e (t) occur approximately three weeks before the local maximum/minimum values of the number of currently infectious individuals I(t).
This observed behavior is due to the fact that the reproduction number influences the rate of spread of an epidemic. This means that, if the reproduction number starts to increase/decrease, then this causes the infectious disease spread to rise/decline. The obtained numerical results are in agreement with this behavior. Finding mathematical proof of this statement is worth pursuing.
We illustrate the observations for the above relation by comparing the COVID-19 data for the daily infectious individuals I(t) and the computed effective reproduction number R e (t), based on a 21-day period, for the selected countries and states.

United States
The obtained results for the basic reproduction number R 0 (t) and the effective reproduction number R e (t) are given in Figure 4. The values for the last few months are lower compared to those prior to February 2022. Figure 5 compares the local maximum values of the number of currently infectious individuals and the corresponding local maxima of the identified effective reproduction number R e . This observation provides us with an instrument for predicting the time of the processes of decline in an epidemic wave. The shapes of the graphs for the currently infectious people and the effective reproduction number R e are similar. To demonstrate the difference, we present a graph with two curves, one for the number of infectious individuals, and another for the effective reproduction number R e , scaled by 1,500,000. Figure 6 presents the obtained results for R 0 (t) and R e (t). As expected, the shapes of the graphs for the currently infectious population and the effective reproduction number R e (t) are similar. Figure 7 compares the number of currently infectious individuals I(t) and the effective reproduction number R e (t), scaled by 50,000. Figure 7 confirms the connection between the local maximum values of the number of currently infectious people and the corresponding local maxima of the effective reproduction number.

Texas
The results for Texas, shown in Figures 8 and 9, show a similar tendency. Figure 9 clearly illustrates the connection between local maximum values of the number of currently infectious individuals and the corresponding local maxima of the effective reproduction number R e (t), multiplied by 150,000.

Georgia and Louisiana
Georgia and Louisiana have relatively small populations compared to Texas, Canada, and the United States. Figure 10 presents the results for R e (t), while Figure 11 displays I(t) and 15,000 R e (t) for the states of Georgia and Louisiana. These results clearly confirm the claim that the local maximum values of the effective reproduction number R e (t) occur a few weeks before the local maximum values of the number of currently infectious individuals I(t).

Discussion
This work presents an instrument for the study and analysis of the spread of infectious diseases within the scope of compartmental SIR-type models. We have developed efficient methods for determining the time-dependent basic and effective reproduction numbers, R 0 (t) and R e (t), through an inverse problem approach. We have applied the developed method to publicly reported COVID-19 data in [44] for Canada and the United States, as well as for the states of Texas, Georgia, and Louisiana. The method uses direct integration of equations resulting from the SIR system and allows for time-dependent infection and recovery rates, thus enabling the results to capture the changes in the reproduction numbers.
We applied the proposed methods to obtain explicit expressions for the time-dependent basic and effective reproduction numbers, R 0 (t) and R e (t), respectively. The inverse problem for the time-dependent reproduction numbers uses available data for the number of susceptible and infectious individuals. We developed two approaches for the identification of the reproduction numbers, namely, the Boundary Value Problem (BVP) and Least Squares Method (LSM), for the derived equations.
Next, we presented and analyzed the numerical results for the selected counties and states. The obtained results demonstrate a relationship between the values of the number of infected individuals I(t) and the scaled effective reproduction number R e (t). The curves follow a distinct pattern, where the local maxima and minima differ by a few weeks, with the effective reproduction number extrema preceding those of the reported data for I(t). In the future, we aim to provide a mathematical proof of the observed correlation between the maximum (and minimum) values of the number of currently infectious individuals and those of the effective reproduction number.
Mathematical modeling of infectious diseases has many known limitations. This study involves assumptions, such as that the reported data include all the cases; that the population under study is well mixed; and that infectious individuals leave the I(t) class and move directly into the R(t) class, although it is now known that some who recovered from COVID-19 individuals can be infected again. The proposed model does not consider asymptomatic and presymptomatic transmissions, which play an important role in spreading an epidemic [45]. Furthermore, numerous symptomatic cases of COVID-19 are unreported. Neglecting to account for these cases leads to artificially inflated transmission rates and, consequently, higher reproduction numbers.
We tested the developed algorithms against reported data for the active cases from [44]. However, we acknowledge that the assumptions for the reported data pose a limitation, since they are not exactly true for COVID-19. Negative tests do not guarantee that individuals are not currently infectious, and there are diagnostic delays (especially with the omicron variant) which may not be detected on the widely used lateral flow test kits until five or six days later, according to [46]. The SIR-type models assume mixing of the population, lack of reinfection, a constant population, and accuracy and completeness of the reported data. Therefore, the models may not capture all the complexities of the pandemic, and the results should be interpreted with caution.

Conclusions
This work presents a novel method for modeling epidemics by using an inverse problem approach to efficiently estimate time-dependent reproduction numbers. We solve the inverse problem by defining the boundary value problem (BVP) and the least squares method (LSM) problems, using datasets of the examined population obtained from publicly available COVID-19 data. The estimated values of the reproduction numbers provide a tool for predicting the spread of infectious diseases. Despite the limitations of the model and the data quality, this study presents a framework for analyzing the spread of infectious diseases. In particular, the observed relationship between the effective reproduction number and the infection data can provide insight into the behavior of the infection curve.  Data Availability Statement: All the data used in this work are available and published. The COVID-19 data used in this research are available at https://www.worldometers.info/coronavirus/.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: SIR Susceptible-Infectious-Removed BVP Boundary-Value Problem LSM Least-Squares Method